Search results for "Mel-frequency cepstrum"

showing 10 items of 13 documents

Image-Evoked Affect and its Impact on Eeg-Based Biometrics

2019

Electroencephalography (EEG) signals provide a representation of the brain’s activity patterns and have been recently exploited for user identification and authentication due to their uniqueness and their robustness to interception and artificial replication. Nevertheless, such signals are commonly affected by the individual’s emotional state. In this work, we examine the use of images as stimulus for acquiring EEG signals and study whether the use of images that evoke similar emotional responses leads to higher identification accuracy compared to images that evoke different emotional responses. Results show that identification accuracy increases when the system is trained with EEG recordin…

021110 strategic defence & security studiesmedicine.diagnostic_testBiometricsComputer scienceSpeech recognition0211 other engineering and technologies02 engineering and technologyElectroencephalographyStimulus (physiology)Statistical classification0202 electrical engineering electronic engineering information engineeringTask analysismedicine020201 artificial intelligence & image processingMel-frequency cepstrum2019 IEEE International Conference on Image Processing (ICIP)
researchProduct

Speech Emotion Recognition method using time-stretching in the Preprocessing Phase and Artificial Neural Network Classifiers

2020

Human emotions are playing a significant role in the understanding of human behaviour. There are multiple ways of recognizing human emotions, and one of them is through human speech. This paper aims to present an approach for designing a Speech Emotion Recognition (SER) system for an industrial training station. While assembling a product, the end user emotions can be monitored and used as a parameter for adapting the training station. The proposed method is using a phase vocoder for time-stretching and an Artificial Neural Network (ANN) for classification of five typical different emotions. As input for the ANN classifier, features like Mel Frequency Cepstral Coefficients (MFCCs), short-te…

Artificial neural networkComputer scienceSpeech recognitionPhase vocoderAudio time-scale/pitch modification020206 networking & telecommunications02 engineering and technologyComputingMethodologies_PATTERNRECOGNITION0202 electrical engineering electronic engineering information engineeringPreprocessor020201 artificial intelligence & image processingMel-frequency cepstrumEmotion recognitionClassifier (UML)Speech rate2020 IEEE 16th International Conference on Intelligent Computer Communication and Processing (ICCP)
researchProduct

ASR performance prediction on unseen broadcast programs using convolutional neural networks

2018

In this paper, we address a relatively new task: prediction of ASR performance on unseen broadcast programs. We first propose an heterogenous French corpus dedicated to this task. Two prediction approaches are compared: a state-of-the-art performance prediction based on regression (engineered features) and a new strategy based on convolutional neural networks (learnt features). We particularly focus on the combination of both textual (ASR transcription) and signal inputs. While the joint use of textual and signal features did not work for the regression baseline, the combination of inputs for CNNs leads to the best WER prediction performance. We also show that our CNN prediction remarkably …

FOS: Computer and information sciencesComputer Science - Computation and LanguageComputer scienceSpeech recognitionFeature extractionInformationSystems_INFORMATIONSTORAGEANDRETRIEVAL02 engineering and technology010501 environmental sciences01 natural sciencesConvolutional neural network[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]Task (project management)[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL]0202 electrical engineering electronic engineering information engineeringTask analysisPerformance prediction020201 artificial intelligence & image processingMel-frequency cepstrumTranscription (software)Hidden Markov modelComputation and Language (cs.CL)ComputingMilieux_MISCELLANEOUS0105 earth and related environmental sciences
researchProduct

Environment Sound Classification using Multiple Feature Channels and Attention based Deep Convolutional Neural Network

2020

In this paper, we propose a model for the Environment Sound Classification Task (ESC) that consists of multiple feature channels given as input to a Deep Convolutional Neural Network (CNN) with Attention mechanism. The novelty of the paper lies in using multiple feature channels consisting of Mel-Frequency Cepstral Coefficients (MFCC), Gammatone Frequency Cepstral Coefficients (GFCC), the Constant Q-transform (CQT) and Chromagram. Such multiple features have never been used before for signal or audio processing. And, we employ a deeper CNN (DCNN) compared to previous models, consisting of spatially separable convolutions working on time and feature domain separately. Alongside, we use atten…

FOS: Computer and information sciencesComputer Science - Machine LearningSound (cs.SD)Computer science020209 energyMachine Learning (stat.ML)02 engineering and technologycomputer.software_genreConvolutional neural networkComputer Science - SoundDomain (software engineering)Machine Learning (cs.LG)Statistics - Machine LearningAudio and Speech Processing (eess.AS)0202 electrical engineering electronic engineering information engineeringFOS: Electrical engineering electronic engineering information engineeringAudio signal processingVDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550business.industrySIGNAL (programming language)Pattern recognitionFeature (computer vision)Benchmark (computing)020201 artificial intelligence & image processingArtificial intelligenceMel-frequency cepstrumbusinesscomputerElectrical Engineering and Systems Science - Audio and Speech ProcessingCommunication channel
researchProduct

Low-Power Audio Keyword Spotting using Tsetlin Machines

2021

The emergence of Artificial Intelligence (AI) driven Keyword Spotting (KWS) technologies has revolutionized human to machine interaction. Yet, the challenge of end-to-end energy efficiency, memory footprint and system complexity of current Neural Network (NN) powered AI-KWS pipelines has remained ever present. This paper evaluates KWS utilizing a learning automata powered machine learning algorithm called the Tsetlin Machine (TM). Through significant reduction in parameter requirements and choosing logic over arithmetic based processing, the TM offers new opportunities for low-power KWS while maintaining high learning efficacy. In this paper we explore a TM based keyword spotting (KWS) pipe…

FOS: Computer and information sciencesspeech commandSound (cs.SD)Computer scienceSpeech recognition02 engineering and technologykeyword spottingMachine learningcomputer.software_genreComputer Science - SoundReduction (complexity)Audio and Speech Processing (eess.AS)020204 information systemsFOS: Electrical engineering electronic engineering information engineering0202 electrical engineering electronic engineering information engineeringElectrical and Electronic EngineeringArtificial neural networkLearning automatabusiness.industrylearning automatalcsh:Applications of electric power020206 networking & telecommunicationslcsh:TK4001-4102Pipeline (software)Power (physics)machine learningTsetlin MachineMFCCKeyword spottingelectrical_electronic_engineeringScalabilityMemory footprintpervasive AI020201 artificial intelligence & image processingMel-frequency cepstrumArtificial intelligencebusinesscomputerartificial neural networkEfficient energy useElectrical Engineering and Systems Science - Audio and Speech Processing
researchProduct

A case study on feature sensitivity for audio event classification using support vector machines

2016

Automatic recognition of multiple acoustic events is an interesting problem in machine listening that generalizes the classical speech/non-speech or speech/music classification problem. Typical audio streams contain a diversity of sound events that carry important and useful information on the acoustic environment and context. Classification is usually performed by means of hidden Markov models (HMMs) or support vector machines (SVMs) considering traditional sets of features based on Mel-frequency cepstral coefficients (MFCCs) and their temporal derivatives, as well as the energy from auditory-inspired filterbanks. However, while these features are routinely used by many systems, it is not …

Machine listeningComputer sciencebusiness.industryEvent (computing)Speech recognitionFeature extractionContext (language use)Pattern recognition02 engineering and technologySupport vector machine030507 speech-language pathology & audiology03 medical and health sciencesComputingMethodologies_PATTERNRECOGNITION0202 electrical engineering electronic engineering information engineeringFeature (machine learning)020201 artificial intelligence & image processingArtificial intelligenceMel-frequency cepstrum0305 other medical sciencebusinessHidden Markov model2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP)
researchProduct

Single-channel EEG-based subject identification using visual stimuli

2021

Electroencephalography (EEG) signals have been recently proposed as a biometrics modality due to some inherent advantages over traditional biometric approaches. In this work, we studied the performance of individual EEG channels for the task of subject identification in the context of EEG-based biometrics using a recently proposed benchmark dataset that contains EEG recordings acquired under various visual and non-visual stimuli using a low-cost consumer-grade EEG device. Results showed that specific EEG electrodes provide consistently higher identification accuracy regardless of the feature and stimuli types used, while features based on the Mel Frequency Cepstral Coefficients (MFCC) provi…

Modality (human–computer interaction)Biometricsmedicine.diagnostic_testComputer sciencebusiness.industryFeature extractionComputerApplications_COMPUTERSINOTHERSYSTEMSPattern recognitionContext (language use)ElectroencephalographyIdentification (information)ComputingMethodologies_PATTERNRECOGNITIONFeature (computer vision)medicineArtificial intelligenceMel-frequency cepstrumbusiness2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI)
researchProduct

MFCC-based Recurrent Neural Network for automatic clinical depression recognition and assessment from speech

2022

Abstract Clinical depression or Major Depressive Disorder (MDD) is a common and serious medical illness. In this paper, a deep Recurrent Neural Network-based framework is presented to detect depression and to predict its severity level from speech. Low-level and high-level audio features are extracted from audio recordings to predict the 24 scores of the Patient Health Questionnaire and the binary class of depression diagnosis. To overcome the problem of the small size of Speech Depression Recognition (SDR) datasets, expanding training labels and transferred features are considered. The proposed approach outperforms the state-of-art approaches on the DAIC-WOZ database with an overall accura…

Modality (human–computer interaction)Mean squared errorComputer scienceSpeech recognitionBiomedical EngineeringHealth Informaticsmedicine.diseaseClass (biology)Patient Health QuestionnaireComputingMethodologies_PATTERNRECOGNITIONRecurrent neural networkSignal ProcessingmedicineMajor depressive disorderMel-frequency cepstrumDepression (differential diagnoses)Biomedical Signal Processing and Control
researchProduct

On the Robustness of Deep Features for Audio Event Classification in Adverse Environments

2018

Deep features, responses to complex input patterns learned within deep neural networks, have recently shown great performance in image recognition tasks, motivating their use for audio analysis tasks as well. These features provide multiple levels of abstraction which permit to select a sufficiently generalized layer to identify classes not seen during training. The generalization capability of such features is very useful due to the lack of complete labeled audio datasets. However, as opposed to classical hand-crafted features such as Mel-frequency cepstral coefficients (MFCCs), the performance impact of having an acoustically adverse environment has not been evaluated in detail. In this p…

ReverberationNoise measurementComputer scienceSpeech recognitionFeature extraction02 engineering and technologyConvolutional neural network030507 speech-language pathology & audiology03 medical and health sciencesRaw audio formatRobustness (computer science)Audio analyzer0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingMel-frequency cepstrum0305 other medical science2018 14th IEEE International Conference on Signal Processing (ICSP)
researchProduct

Embedded Knowledge-based Speech Detectors for Real-Time Recognition Tasks

2006

Speech recognition has become common in many application domains, from dictation systems for professional practices to vocal user interfaces for people with disabilities or hands-free system control. However, so far the performance of automatic speech recognition (ASR) systems are comparable to human speech recognition (HSR) only under very strict working conditions, and in general much lower. Incorporating acoustic-phonetic knowledge into ASR design has been proven a viable approach to raise ASR accuracy. Manner of articulation attributes such as vowel, stop, fricative, approximant, nasal, and silence are examples of such knowledge. Neural networks have already been used successfully as de…

Settore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniVoice activity detectionArtificial neural networkDictationbusiness.industryComputer scienceSpeech recognitionSpeech technologycomputer.software_genreSpeech processingManner of articulationSilenceVowelComputer ScienceTelecommunicationsMel-frequency cepstrumArtificial intelligencespeech detectorUser interfacebusinesscomputerNatural language processing
researchProduct